Sampling-based approximate skyline calculation on big data
نویسندگان
چکیده
Nowadays, big data is coming to the force in a lot of applications. Processing skyline query on more than linear time by far too expensive and often even may be slow. It obviously not possible compute an exact solution sublinear time, since itself have size. Fortunately, many situations, fast approximate useful slower solution. This paper proposes two sampling-based algorithms for processing queries. The first algorithm obtains fixed size sample computes it. error only relatively small most cases, but also almost unaffected input second returns [Formula: see text]-approximation efficiently. running has nothing do with practical, achieving goal sublinearity data. Experiments verify analysis algorithm, show that much faster existing algorithms.
منابع مشابه
Sampling Based Range Partition Methods for Big Data Analytics
Big Data Analytics requires partitioning datasets into thousands of partitions according to a specific set of keys so that different machines can process different partitions in parallel. Range partition is one of the ways to partition the data that is needed whenever global ordering is required. It partitions the data according to a pre-defined set of exclusive and continuous ranges that cover...
متن کاملSkyline Computation on Commercial Data
• Our data set contains data on 55208 cars [1]. • To each car, 23 attributes are assigned. – correlated (e.g., cylinders and engine size). – anti-correlated (e.g., mileage and registration date). – nearly independent (e.g., mileage and horsepower). • Outliers countervail correlation effects. • Cardinalities differ greatly, e.g.: – 5988 different values for attribute price. – only 17 different v...
متن کاملError-bounded Sampling for Analytics on Big Sparse Data
Aggregation queries are at the core of business intelligence and data analytics. In the big data era, many scalable sharednothing systems have been developed to process aggregation queries over massive amount of data. Microsoft’s SCOPE is a well-known instance in this category. Nevertheless, aggregation queries are still expensive, because query processing needs to consume the entire data set, ...
متن کاملData Interpolation: An Efficient Sampling Alternative for Big Data Aggregation
Given a large set of measurement sensor data, in order to identify a simple function that captures the essence of the data gathered by the sensors, we suggest representing the data by (spatial) functions, in particular by polynomials. Given a (sampled) set of values, we interpolate the datapoints to define a polynomial that would represent the data. The interpolation is challenging, since in pr...
متن کاملImportance Sampling Algorithms for Belief Networks based on Approximate Computation
In this paper we study a new general class of algorithms for the propagation of probabilities on graphical structures based on importance sampling techniques. The idea is to make an approximate and fast propagation in order to obtain a sampling distribution as close as possible to the true one. Our proposal is based on a deletion sequence of the variables to calculate the 'a posteriori' probabi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Discrete Mathematics, Algorithms and Applications
سال: 2021
ISSN: ['1793-8309', '1793-8317']
DOI: https://doi.org/10.1142/s1793830922500240